Picture for Dandan Tu

Dandan Tu

Huawei Technologies Co., Ltd., Beijing, China

Reinforcement Learning with Robust Rubric Rewards

Add code
May 28, 2026
Viaarxiv icon

Claw-Anything: Benchmarking Always-On Personal Assistants with Broader Access to User's Digital World

Add code
May 25, 2026
Viaarxiv icon

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

Add code
May 12, 2026
Viaarxiv icon

SeePhys Pro: Diagnosing Modality Transfer and Blind-Training Effects in Multimodal RLVR for Physics Reasoning

Add code
May 10, 2026
Viaarxiv icon

Culture-Aware Machine Translation in Large Language Models: Benchmarking and Investigation

Add code
Apr 27, 2026
Viaarxiv icon

Visual Preference Optimization with Rubric Rewards

Add code
Apr 14, 2026
Viaarxiv icon

Schema-Aware Planning and Hybrid Knowledge Toolset for Reliable Knowledge Graph Triple Verification

Add code
Apr 05, 2026
Viaarxiv icon

Not All Tokens See Equally: Perception-Grounded Policy Optimization for Large Vision-Language Models

Add code
Apr 02, 2026
Viaarxiv icon

CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification

Add code
Mar 02, 2026
Viaarxiv icon

Scene-Aware Memory Discrimination: Deciding Which Personal Knowledge Stays

Add code
Feb 12, 2026
Viaarxiv icon